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Abstract 

This  paper  presents  two  complementary  ideas  re¬ 
lating  the  study  of  human  development  and  the 
construction  of  intelligent  artifacts.  First,  the  use 
of  developmental  models  will  be  a  critical  require¬ 
ment  in  the  construction  of  robotic  systems  that 
can  acquire  a  large  repertoire  of  motor,  percep¬ 
tual,  and  cognitive  capabilities.  Second,  robotic 
systems  can  be  used  as  a  test-bed  for  evaluating 
models  of  human  development  much  in  the  same 
way  that  simulation  studies  are  currently  used 
to  evaluate  cognitive  models.  To  further  explore 
these  ideas,  two  examples  from  the  author’s  own 
work  will  be  presented:  the  use  of  developmental 
models  of  hand-eye  coordination  to  simplify  the 
task  of  learning  to  reach  for  a  visual  target  and 
the  use  of  a  humanoid  robot  to  evaluate  models 
of  normal  and  abnormal  social  skill  development. 

Introduction 

Research  on  human  development  and  research  on  the 
construction  of  intelligent  artifacts  can  and  should  be 
complementary.  Studies  of  human  development  have 
produced  a  great  variety  of  theories,  models,  and  exper¬ 
imental  constructs  which  have  long  been  an  inspiration 
for  implementations  of  robotic  systems.  Research  from 
human  development  has  often  served  as  the  inspiration 
for  both  challenging  research  questions  and  useful  task 
decompositions.  However,  computational  studies  of  de¬ 
velopmental  processes  have  had  little  impact  on  the  the¬ 
oretical  constructs  present  in  developmental  psychology 
today,  and  the  influence  of  robotics  on  developmental 
studies  has  been  almost  completely  absent.  In  this  pa¬ 
per,  I  will  argue  that  not  only  will  robotics  come  to  rely 
upon  human  development  for  inspiration  and  practical 
theories,  but  also  will  human  development  profit  from 
the  evaluation  and  experimentation  opportunities  that 
robotics  offers.  In  next  section,  I  will  briefly  describe 
the  practical  and  theoretical  ways  in  which  developmen¬ 
tal  models  aid  in  the  construction  of  intelligent  artifacts 
by  focusing  on  the  implementation  of  simple  hand-eye 
coordination  that  our  group  has  implemented  on  a  hu¬ 
manoid  robot.  In  the  final  section,  I  will  discuss  work 
in  progress  on  using  a  robotic  platform  as  a  unique  test¬ 


bed  to  evaluate  models  of  social  skill  development  for 
both  normal  and  autistic  individuals. 

How  Developmental  Psychology 
Impacts  Robotics 

Developmental  psychology  is  most  typically  employed 
in  robotics  research  as  a  source  of  inspiration.  Ques¬ 
tions  that  have  been  addressed  in  the  developmental 
psychology  literature  (such  as  the  how  infants  learn 
to  orient  to  salient  stimuli  and  how  children  learn  to 
navigate  unfamiliar  locations)  have  focused  on  issues 
that  have  also  been  of  interest  to  the  robotics  commu¬ 
nity.  Models  from  developmental  psychology  often  offer 
behavioral  decomposition  and  observations  about  task 
performance  which  may  provide  an  outline  for  a  soft¬ 
ware  architecture.  Techniques  for  studying  skill  pro¬ 
gressions  have  also  been  adapted  as  evaluation  tech¬ 
niques  for  robotics  systems. 

However,  a  developmental  approach  to  robot  con¬ 
struction  also  provides  practical  benefits.  Human  de¬ 
velopment  exploits  a  gradual  increase  in  both  internal 
complexity  (perceptual  and  motor)  and  external  com¬ 
plexity  (task  and  environmental  complexity  regulated 
by  the  instructor)  to  optimize  the  acquisition  of  new 
skills.  For  example,  infants  are  born  with  low  acu¬ 
ity  vision  which  simplifies  the  visual  input  they  must 
process.  The  infant’s  visual  performance  develops  in 
step  with  their  ability  to  process  the  influx  of  stimula¬ 
tion  (Johnson).  The  same  is  true  for  the  motor  system. 
Newborn  infants  do  not  have  independent  control  over 
each  degree  of  freedom  of  their  limbs,  but  through  a 
gradual  increase  in  the  granularity  of  their  motor  con¬ 
trol  they  learn  to  coordinate  the  full  complexity  of  their 
bodies.  A  process  in  which  the  acuity  of  both  sensory 
and  motor  systems  are  gradually  increased  significantly 
reduces  the  difficulty  of  the  learning  problem  (Thelen 
&  Smith) .  The  caregiver  also  acts  to  gradually  increase 
the  task  complexity  by  structuring  and  controlling  the 
complexity  of  the  environment.  Our  group  has  previ¬ 
ously  argued  that  developmental  approaches  to  robot 
construction  produce  systems  that  can  scale  naturally 
to  more  complex  tasks  and  problem  domains  by  opti¬ 
mizing  learning  in  a  similar  way  (Brooks,  (Ferrell),  Irie, 
Kemp,  Marjanovic,  Scassellati  &  Williamson).  By  ex- 
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Figure  1:  Cog,  an  upper-torso  humanoid  robot  with 
twenty-one  degrees  of  freedom  and  a  variety  of  sensory 
systems  including  visual,  auditory,  tactile,  kinesthetic, 
and  vestibular  systems. 

ploiting  a  gradual  increase  in  complexity  both  internal 
and  external,  while  reusing  structures  and  information 
gained  from  previously  learned  behaviors,  increasingly 
more  sophisticated  behaviors  can  be  acquired  (Ferrell; 
Scassellati). 

Example  #1  :  Hand-Eye  Coordination 

Diamond  (1990)  has  shown  that  infants  between  five 
and  twelve  months  of  age  progress  through  a  number 
of  distinct  phases  in  the  development  of  visually  guided 
reaching.  In  this  progression,  infants  in  later  phases 
consistently  demonstrate  more  sophisticated  reaching 
strategies  to  retrieve  a  toy  in  more  challenging  scenar¬ 
ios.  As  the  infant’s  reaching  competency  develops,  later 
stages  incrementally  improve  upon  the  competency  af¬ 
forded  by  the  previous  stages.  Within  our  group,  Mar- 
janovic,  Scassellati  &  Williamson  (1996)  applied  a  simi¬ 
lar  bootstrapping  technique  to  enable  a  humanoid  robot 
(shown  in  Figure  1)  to  learn  to  point  to  a  visual  target. 
This  pointing  behavior  is  learned  over  many  repeated 
trials  without  human  supervision,  using  gradient  de¬ 
scent  methods  to  train  forward  and  inverse  mappings 
between  a  visual  parameter  space  and  an  arm  posi¬ 
tion  parameter  space.  Without  a  developmental  per¬ 
spective,  the  problem  of  pointing  to  a  visual  target  is 
a  degenerate  R 2  — >•  R 4  sensory-motor  mapping  prob¬ 
lem  with  no  obvious  training  signal;  the  position  of 
the  target  in  the  visual  coordinates  (a  two-dimensional 
quantity)  must  be  converted  into  an  arm  trajectory  for 


the  four  degrees  of  freedom  in  the  arm.  Using  the  be¬ 
havioral  decomposition  Diamond  (1990)  observed  in  in¬ 
fants,  Marjanovic  et  al.  (1996)  reduced  this  R2  — >•  R 6 
function  into  a  pair  of  R2  — >  R2  learned  functions  and 
a  fixed  R2  — >•  R 4  non-degenerate  function  with  obvious 
error  signals. 


From  an  external  perspective,  the  robot’s  behavior 
is  quite  rudimentary.  Given  a  visual  stimulus,  typically 
by  a  researcher  waving  an  object  in  front  of  its  cameras, 
the  robot  saccades  to  foveate  on  the  target,  and  then 
reaches  out  its  arm  toward  the  target.  Early  reaches 
are  inaccurate,  and  often  in  the  wrong  direction  alto¬ 
gether,  but  after  a  few  hours  of  practice  the  accuracy 
improves  drastically.  To  reach  to  a  visual  target,  the 
robot  must  learn  the  mapping  from  the  target’s  image 
coordinates  x  =  (x,  y)  to  the  coordinates  of  the  arm  mo¬ 
tors  a  =  (ao.-.as)  (see  Figure  2).  To  achieve  this,  the 
robot  first  learns  to  foveate  the  target  using  a  saccade 
map  S  :  x  — >•  e  which  relates  positions  in  the  camera  im¬ 
age  with  the  motor  commands  necessary  to  foveate  the 
eye  at  that  location  (e  =  (pan,  tilt)).  Once  the  target 
is  foveated,  the  robot  must  learn  a  ballistic  movement 
mapping  head-centered  coordinates  e  to  arm-centered 
coordinates  a.  To  simplify  the  dimensionality  problems 
involved  in  controlling  a  six  degree-of-freedom  arm,  arm 
positions  are  specified  as  a  linear  combination  of  basis 
posture  primitives. 


Both  the  saccade  map  and  the  ballistic  arm  map  are 
constructed  by  on-line  learning  algorithms.  The  sac¬ 
cade  map  is  trained  using  a  correlation-based  tracker. 
The  error  signal  is  a  vector  in  image  coordinates,  and 
can  be  used  to  directly  train  the  mapping.  Once  the  sac¬ 
cade  map  has  been  trained,  the  ballistic  map  is  trained 
using  by  comparing  arm  motor  command  signals  with 
visual  motion  feedback  clues  to  localize  the  arm  in  vi¬ 
sual  space  (see  Figure  3).  By  visually  tracking  the 
moving  arm,  we  can  obtain  its  final  position  in  im¬ 
age  coordinates.  The  vector  from  the  tip  of  the  arm 
in  the  image  to  the  center  of  the  image  is  the  visual 
error  signal,  which  can  be  converted  into  an  error  in 
gaze  coordinates  using  the  saccade  mapping.  In  this 
way,  the  knowledge  gained  from  learning  to  foveate  a 
target  transforms  the  ballistic  arm  error  into  an  er¬ 
ror  signal  that  can  be  used  to  train  the  arm  directly. 
This  re-use  allows  the  learning  algorithms  to  operate 
continually,  in  real  time,  and  in  an  unstructured  “real- 
world”  environment  without  using  explicit  world  coor¬ 
dinates  or  complex  kinematics.  This  technique  success¬ 
fully  trains  a  reaching  behavior  within  approximately 
three  hours  of  self-supervised  training.  Video  clips 
of  Cog  reaching  to  a  visual  target  are  available  from 
http://www.ai.mit.edu/projects/cog/,  and  addi¬ 
tional  details  on  this  method  can  be  found  in  Mar¬ 
janovic  et  al.  (1996). 
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Figure  2:  Reaching  to  a  visual  target  is  the  product  of  two  sub-skills:  foveating  a  target  and  generating  a  ballistic  reach 
from  that  eye  position.  Image  correlation  can  be  used  to  train  a  saccade  map  which  transforms  retinal  coordinates 
into  gaze  coordinates  (eye  positions).  This  saccade  map  can  then  be  used  in  conjunction  with  motion  detection  to 
train  a  ballistic  map  which  transforms  gaze  coordinates  into  a  ballistic  reach. 


How  Robotics  Can  Impact 
Developmental  Psychology 

I  have  proposed  that  humanoid  robotics  research  can 
also  investigate  scientific  questions  about  the  nature  of 
human  intelligence  (Scassellati;  Scassellati).  Humanoid 
robots  can  serve  as  a  unique  tool  to  investigators  in  the 
cognitive  sciences.  Robotic  implementations  of  cogni¬ 
tive,  behavioral,  and  developmental  models  provide  a 
test-bed  for  evaluating  the  predictive  power  and  validity 
of  those  models.  An  implemented  robotic  model  allows 
for  more  accurate  testing  and  validation  of  these  mod¬ 
els  through  controlled,  repeatable  experiments.  Slight 
experimental  variations  can  be  used  to  isolate  and  eval¬ 
uate  single  factors  (whether  environmental  or  internal) 
independent  of  many  of  the  confounds  that  affect  nor¬ 
mal  behavioral  observations.  Experiments  can  also  be 
repeated  with  nearly  identical  conditions  to  allow  for 
easy  validation.  Further,  internal  model  structures  can 
be  manipulated  to  observe  the  quantitative  and  qual¬ 
itative  effects  on  behavior.  A  robotic  model  can  also 
be  subjected  to  controversial  testing  that  is  potentially 
hazardous,  costly,  or  unethical  to  conduct  on  humans; 
the  “boundary  conditions”  of  the  models  can  be  ex¬ 
plored  by  testing  alternative  learning  and  environmen¬ 
tal  conditions.  Finally,  a  robotic  model  can  be  used  to 
suggest  and  evaluate  potential  intervention  strategies 
before  applying  them  to  human  subjects. 

Example  #2  :  Development  of  Joint 
Reference 

One  of  the  critical  precursors  to  social  learning  in  hu¬ 
man  development  is  the  ability  to  selectively  attend 
to  an  object  of  mutual  interest.  Humans  have  a  large 
repertoire  of  social  cues,  such  as  gaze  direction,  point¬ 
ing  gestures,  and  postural  cues,  that  all  indicate  to  an 
observer  which  object  is  currently  under  consideration. 
These  abilities,  collectively  named  mechanisms  of  joint 


(or  shared)  attention,  are  vital  to  the  normal  devel¬ 
opment  of  social  skills  in  children.  Joint  attention  to 
objects  and  events  in  the  world  serves  as  the  initial 
mechanism  for  infants  to  share  experiences  with  others 
and  to  negotiate  shared  meanings.  Joint  attention  is 
also  a  mechanism  for  allowing  infants  to  leverage  the 
skills  and  knowledge  of  an  adult  caretaker  in  order  to 
learn  about  their  environment,  in  part  by  allowing  the 
infant  to  manipulate  the  behavior  of  the  caretaker  and 
in  part  by  providing  a  basis  for  more  complex  forms  of 
social  communication  such  as  language  and  gestures. 

Joint  attention  has  been  investigated  by  researchers 
in  a  variety  of  fields.  Experts  in  child  development 
are  interested  in  these  skills  as  part  of  the  normal 
developmental  course  that  infants  acquire  extremely 
rapidly,  and  in  a  stereotyped  sequence  (Scaife  &  Bruner; 
Moore  &  Dunham).  Additional  work  on  the  etiol¬ 
ogy  and  behavioral  manifestations  of  pervasive  devel¬ 
opmental  disorders  such  as  autism  and  Asperger’s  syn¬ 
drome  have  focused  on  disruptions  to  joint  attention 
mechanisms  and  demonstrated  how  vital  these  skills 
are  in  human  social  interactions  (Cohen  &  Volkmar; 
Baron-Cohen).  Philosophers  have  been  interested  in 
joint  attention  both  as  an  explanation  for  issues  of 
contextual  grounding  and  as  a  precursor  to  a  the¬ 
ory  of  other  minds  (Whiten;  Dennett).  Evolution¬ 
ary  psychologists  and  primatologists  have  focused  on 
the  evolution  of  these  simple  social  skills  through¬ 
out  the  animal  kingdom  as  a  means  of  evaluating 
both  the  presence  of  theory  of  mind  and  as  a  mea¬ 
sure  of  social  functioning  (Povinelli  &  Preuss;  Hauser; 
Premack). 

The  inspiration  for  an  implementation  of  joint  refer¬ 
ence  comes  from  Baron-Cohen  (1995).  Baron-Cohen’s 
model  gives  a  coherent  account  of  the  observed  develop¬ 
mental  stages  of  joint  attention  behaviors  in  both  nor¬ 
mal  and  blind  children,  the  observed  deficiencies  in  joint 
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Figure  3:  Generation  of  error  signals  from  a  single  reaching  trial.  Once  a  visual  target  is  foveated,  the  gaze  coordinates 
are  transformed  into  a  ballistic  reach  by  the  ballistic  map.  By  observing  the  position  of  the  moving  hand,  we  can 
obtain  a  reaching  error  signal  in  image  coordinates,  which  can  be  converted  back  into  gaze  coordinates  using  the 
saccade  map. 


attention  of  children  with  autism,  and  a  partial  expla¬ 
nation  of  the  observed  abilities  of  primates  on  joint  at¬ 
tention  tasks.  The  model  provides  both  a  skill  decom¬ 
position  and  a  potential  system  architecture  for  con¬ 
structing  a  system  that  can  recognize  and  respond  to 
eye  contact,  gaze  direction,  and  imperative  and  declar¬ 
ative  pointing  gestures. 

A  robotic  implementation  of  Baron-Cohen’s  model  is 
currently  under  construction  (Scassellati).  We  have  al¬ 
ready  implemented  a  perceptual  system  capable  of  find¬ 
ing  faces  and  eyes  (Scassellati).  The  system  first  locates 
potential  face  locations  in  the  peripheral  image  using  a 
template-based  matching  algorithm.  Once  a  potential 
face  location  has  been  identified,  the  robot  saccades  to 
that  target  using  the  saccade  mapping  S  described  ear¬ 
lier.  The  location  of  the  face  in  peripheral  image  co¬ 
ordinates  (p(x,y))  is  then  mapped  into  foveal  image  co¬ 
ordinates  (f(x,y))  using  a  second  learned  mapping,  the 
foveal  map  F  :  P(x,y)  ^  f(x,y )•  The  location  of  the  face 
within  the  peripheral  image  can  then  be  used  to  extract 
the  sub-image  containing  the  eye  for  further  processing. 
This  technique  has  been  successful  at  locating  and  ex¬ 
tracting  sub-images  that  contain  eyes  under  a  variety 
of  conditions  and  from  many  different  individuals.  Ad¬ 
ditional  modules  including  a  context-sensitive  attention 
system  (Breazeal  &  Scassellati),  a  system  of  human-like 
eye  and  neck  movement  (Brooks,  Breazeal,  Marjanovic, 
Scassellati  &  Williamson),  and  a  system  for  regulating 
interaction  intensities  (Breazeal  &  Scassellati)  have  also 
been  implemented. 

Advantages  of  a  Robotic  Implementation  A 

robotic  approach  to  studies  of  joint  attention  and  so¬ 


cial  skill  development  has  three  main  advantages.  First, 
human  observers  readily  anthropomorphize  their  social 
interactions  with  a  human-like  robot.  Second,  the  con¬ 
struction  of  a  physically  embodied  system  may  be  com¬ 
putationally  simpler  than  the  construction  of  a  simu¬ 
lation  of  sufficient  detail.  Third,  the  skills  that  must 
be  implemented  to  test  these  models  are  useful  for  a 
variety  of  other  practical  robotics  tasks. 

Interactions  with  a  robotic  agent  are  easily  anthropo¬ 
morphized  by  children  and  adults.  An  embodied  system 
with  human  form  allows  for  natural  social  interactions 
to  occur  without  any  additional  training  or  prompting. 
Observers  need  not  be  trained  in  special  procedures  nec¬ 
essary  to  interact  with  the  robot;  the  same  behaviors 
that  they  use  for  interacting  with  other  people  allow 
them  to  interact  naturally  with  the  robot.  In  our  expe¬ 
rience,  and  in  the  empirical  studies  by  Reeves  &  Nass 
(1996),  people  readily  treat  a  robot  as  if  it  were  another 
person.  Human  form  also  provides  important  task  con¬ 
straints  on  the  behavior  of  the  robot.  For  example,  to 
observe  an  object  carefully,  our  robot  must  orient  its 
head  and  eyes  toward  a  target.  These  task  constraints 
allow  observers  to  easily  interpret  the  behavior  of  the 
robot. 

A  second  reason  for  choosing  a  robotic  implemen¬ 
tation  is  that  physical  embodiment  may  actually  sim¬ 
plify  the  computation  necessary  for  this  task.  The  di¬ 
rect  physical  coupling  between  action  and  perception 
reduces  the  need  for  an  intermediary  representation. 
For  an  embodied  system,  internal  representations  can 
be  ultimately  grounded  in  sensory-motor  interactions 
with  the  world  (Lakoff);  there  is  no  need  to  model  as¬ 
pects  of  the  environment  that  can  simply  be  experi- 


Figure  4:  Examples  of  successful  face  and  eye  detections.  The  system  locates  faces  in  the  peripheral  camera,  saccades 
to  that  position,  and  then  extracts  the  eye  image  from  the  foveal  camera.  The  position  of  the  eye  is  inexact,  in  part 
because  the  human  subjects  are  not  motionless. 


enced  (Brooks;  Brooks).  The  effects  of  gravity,  friction, 
and  natural  human  interaction  are  obtained  for  free, 
without  any  computation.  Embodied  systems  can  also 
perform  some  complex  tasks  in  relatively  simple  ways 
by  exploiting  the  properties  of  the  complete  system. 
For  example,  when  putting  a  jug  of  milk  in  the  refriger¬ 
ator,  you  can  exploit  the  pendulum  action  of  your  arm 
to  move  the  milk  (Greene).  The  swing  of  the  jug  does 
not  need  to  be  explicitly  planned  or  controlled,  since  it 
is  the  natural  behavior  of  the  system.  Instead  of  hav¬ 
ing  to  plan  the  whole  motion,  the  system  only  has  to 
modulate,  guide  and  correct  the  natural  dynamics. 

Third,  the  social  skills  that  we  must  implement  to 
test  these  models  are  important  from  an  engineer¬ 
ing  perspective.  A  robotic  system  that  can  recognize 
and  engage  in  joint  attention  behaviors  will  allow  for 
human-machine  interactions  that  have  previously  not 
been  possible.  The  robot  would  be  capable  of  learning 
from  an  observer  using  normal  social  signals  in  the  same 
way  that  human  infants  learn;  no  specialized  training  of 
the  observer  would  be  necessary.  The  robot  would  also 
be  capable  of  expressing  its  internal  state  (emotions, 
desires,  goals,  etc.)  through  social  interactions  with¬ 
out  relying  upon  an  artificial  vocabulary.  Further,  a 
robot  that  can  recognize  the  goals  and  desires  of  others 
will  allow  for  systems  that  can  more  accurately  react  to 
the  emotional,  attentional,  and  cognitive  states  of  the 
observer,  can  learn  to  anticipate  the  reactions  of  the 
observer,  and  can  modify  its  own  behavior  accordingly. 

Implementing  this  progression  for  a  robotic  system 


provides  a  simple  means  of  bootstrapping  behaviors. 
The  capabilities  used  in  detecting  and  maintaining  eye 
contact  can  be  extended  to  provide  a  rough  angle  of 
gaze.  By  tracking  along  this  angle  of  gaze,  and  watch¬ 
ing  for  objects  that  have  salient  color,  intensity,  or  mo¬ 
tion,  our  robot  can  mimic  the  ecological  strategy.  From 
an  ecological  mechanism,  we  can  refine  the  algorithms 
for  determining  gaze  and  add  mechanisms  for  determin¬ 
ing  vergence.  A  rough  geometric  strategy  can  then  be 
implemented,  and  later  refined  through  feedback  from 
the  caretaker.  A  representational  strategy  requires  the 
ability  to  maintain  information  on  salient  objects  that 
are  outside  of  the  field  of  view  including  information  on 
their  appearance,  location,  size,  and  salient  properties. 
The  implementation  of  this  strategy  requires  us  to  make 
assumptions  about  the  important  properties  of  objects 
that  must  be  included  in  a  representational  structure, 
a  topic  beyond  the  scope  of  this  paper. 

Evaluating  the  Robotic  Implementation  A 

robotic  implementation  of  a  behavioral  model  provides 
a  standardized  evaluation  mechanism.  Behavioral  ob¬ 
servation  and  classification  techniques  that  are  used  on 
children  and  adults  can  be  applied  to  the  behavior  of 
our  robot  with  only  minimal  modifications.  Because 
of  their  use  in  the  diagnosis  and  assessment  of  autism 
and  related  disorders,  evaluation  tools  for  joint  atten¬ 
tion  mechanisms,  such  as  the  Vineland  Adaptive  Be¬ 
havior  Scales,  the  Autism  Diagnostic  Interview,  and  the 
Autism  Diagnostic  Observation  Schedule,  have  been  ex- 


tensively  studied  (Sparrow,  Marans,  Klin,  Carter,  Volk- 
mar  &  Cohen;  Powers).  With  the  evaluations  obtained 
from  these  tools,  the  success  of  our  implementation  ef¬ 
forts  can  be  tested  using  the  same  criteria  that  are  ap¬ 
plied  to  human  behaviors.  The  behavior  of  the  complete 
robotic  implementation  can  be  compared  with  develop¬ 
mental  data  from  normal  children.  Furthermore,  by 
inhibiting  specific  modules  within  the  model,  the  robot 
should  produce  behavior  that  can  be  compared  with 
developmental  data  from  autistic  children.  With  these 
evaluation  techniques,  we  can  determine  the  extent  to 
which  our  model  matches  the  observed  biological  data. 
However,  what  conclusions  can  we  draw  from  the  out¬ 
comes  of  these  studies? 

One  possible  outcome  is  that  our  robotic  implementa¬ 
tion  will  match  the  expected  behavior  evaluations,  that 
is,  the  complete  system  will  demonstrate  normal  uses 
of  joint  attention.  In  this  case,  our  efforts  have  pro¬ 
vided  evidence  that  the  model  is  internally  consistent 
in  producing  the  desired  behaviors,  but  says  nothing 
about  the  underlying  biological  processes.  We  can  ver¬ 
ify  that  the  model  provides  a  possible  explanation  for 
the  normal  (and  abnormal)  development  of  joint  atten¬ 
tion,  but  we  cannot  verify  that  this  model  accurately 
reflects  what  happens  in  biology. 

If  the  robotic  implementation  does  not  meet  the  same 
behavioral  criteria,  the  reasons  for  the  failure  are  signifi¬ 
cant.  The  implementation  may  be  unsuccessful  because 
of  an  internal  logical  flaw  in  the  model.  In  this  case,  we 
can  identify  shortcomings  of  the  proposed  model  and 
potentially  suggest  alternate  solutions.  A  more  diffi¬ 
cult  failure  may  result  if  our  environmental  conditions 
differ  too  significantly  from  normal  human  social  inter¬ 
actions.  While  the  work  of  Reeves  &  Nass  (1996)  leads 
us  to  believe  that  this  result  will  not  occur,  this  pos¬ 
sibility  allows  us  to  draw  conclusions  only  about  our 
implementation  and  not  the  model  or  the  underlying 
biological  factors. 

Future  Work  The  implementation  of  Baron- 
Cohen’s  model  is  still  work  in  progress.  All  of  the  ba¬ 
sic  sensory-motor  skills  have  been  demonstrated.  The 
robot  can  move  its  eyes  in  many  human-like  ways,  in¬ 
cluding  saccades,  vergence,  tracking,  and  maintaining 
fixation  through  vestibulo-ocular  and  opto-kinetic  re¬ 
flexes.  Orientation  with  the  neck  to  maximize  eye 
range  has  been  implemented,  as  well  as  coordinated  arm 
pointing.  Perceptual  components  of  eye  detection  have 
also  been  constructed;  the  robot  can  detect  and  foveate 
faces  to  obtain  high-resolution  images  of  eyes. 

These  initial  results  are  incomplete,  but  have  pro¬ 
vided  encouraging  evidence  that  the  technical  problems 
faced  by  an  implementation  of  this  nature  are  within 
our  grasp.  Cog’s  perceptual  systems  have  been  suc¬ 
cessful  at  finding  faces  and  eyes  in  real-time,  and  in 
real-world  environments.  Simple  social  behaviors,  such 
as  eye-neck  orientation  and  head-nod  imitation,  have 
been  easy  to  interpret  by  human  observers  who  have 
found  their  interactions  with  the  robot  to  be  both  be¬ 


lievable  and  entertaining. 

Our  future  work  will  focus  on  the  construction  and 
implementation  of  the  remainder  of  the  modules  from 
Baron-Cohen’s  model.  From  an  engineering  perspec¬ 
tive,  this  approach  has  already  succeeded  in  providing 
adaptive  solutions  to  classical  problems  in  behavior  in¬ 
tegration,  space- variant  perception,  and  the  integration 
of  multiple  sensory  and  motor  modalities.  From  a  sci¬ 
entific  perspective,  we  are  optimistic  that  when  com¬ 
pleted,  this  implementation  will  provide  new  insights 
and  evaluation  methods  for  models  of  social  develop¬ 
ment. 
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